The following course although uses Data Camp quite significantly, but I am going to add the courses from R swirl package too. Why? Because I wan't to get the best of both worlds.
R swirl package courses are a bit difficult at first but nothing undoable. It's just my way of practising R programming.
Swirl: R Programming
"First things, First!"
Step 1: Get R
(copy pasting stuff, why re-invent the wheel!)
In order to run swirl, you must have R 3.1.0 or later installed on your computer.
If you need to install R, you can do so here.
For help installing R, check out one of the following videos (courtesy of Roger Peng at Johns Hopkins Biostatistics):
Step 2: Get R Studio!
Although not mendatory, as I am using an R kernal for Jupyter notebook just to add in more interactivity as I am a kinesthetic learner, but swirl courses can be taken using R-studio which is an IDE for Running R-scripts and much more.
Here's a link to the IDE: R Studio
Step 3: Install Swirl
Well we might wonder, What is is Swirl? In short, It's an R package for learning R, inside R, cool!
Inside R-studio, into the console, after >
type:
install.packages("swirl")
Patience is the key, it will take a min to download the swirl package from CRAN repo.
Step 4: Start Swirl
Now load the package by typing in the console:
library(swirl)
Call the swirl() method, to begin the interactive session.
swirl()
Step 5: Installing an Interactive Course
Although this is not the scope of what I am practising, but just to be helpful here, we can install more interactive courses.
Following is the link to more courses.
If we Wan't more? Here's the whole Swirl Course Network for us.
That's it!
[RQ1: ] What are the three advantages of using R?
Ans:
Publication quality visualizations.
R pacakges
Reproudcibiliy, using R-scripts.
[RQ2: ] What are the two reasons for using R-scripts?
Ans:
Reproducibiliy.
Automate your work.
[RQ3: ] _Which symbol should you use to place comments in R?
Ans: The pound sign #
.
**[RQ4: ]Which R function should you use to remove a variable from the workspace in R?
Ans: To remove a variable ( or any R object ) from workspace, we can use the remove method, type in the console: rm("variable/object name")
Note: Beware! doing so will completely remove that object from R's workspace.
In [1]:
####################################################
# Title: How it works #
# -------------------------------------------------#
# About: Getting to know R #
# -------------------------------------------------#
# Instructions: #
# > Add another line of code to that calculates #
# the sum of 6 and 12, hit enter if in R-studio#
# else, hit ctl + Enter key in Jupyter Notebook#
####################################################
6 + 12
Preface:
Adding comments to your code is extremely important to make sure that you and others can understand what your code is about. R makes use of the # sign to add comments, just like Twitter!
It's important to note that comments are not run as R code, so they will not influence your result. For example, Calculate 3 + 4 in the editor on the right is a comment and is ignored during execution.
Instructions:
In [2]:
#####################################################
# Title: Documenting your code, and we are doing it!#
# --------------------------------------------------#
#####################################################
# Calculate 3 + 4
3 + 4
# Calculate 6 + 12
6 + 12
In its most basic form R can be used as a simple calculator. Consider the following arithmetic operators:
+
-
*
/
^
%%
The last two might need some explaining:
^
operator raises the number to its left to the power of the number to its right: for example 3^2
equals 9.5 %% 3
equals 2.Pop Quiz
[Swirl]Why we use programming language as opposed to a calculator?
Ans: "To automate some process, or avoid unnecessary repetition."
Instructions:
Type 2^5
in the editor to calculate 2 to the power 5.
Type 28 %% 6
to calculate 28 modulo 6.
Click 'Submit Answer' and have a look at the R output in the console.
In [3]:
#####################################################
# Title: Little arithmetics with R #
# --------------------------------------------------#
# About: Doing some basic arithmetic with R #
#####################################################
# Addition
5 + 5
# Subtraction
5 - 5
# Multiplication
3 * 5
# Division
(5 + 5) / 2
# Exponentiation
2^5
# Modulo
28 %% 6
There are things that make R the awesome and immensely popular language that it is today. On the other hand, there are also aspects about R that are less attractive.
Which of the following statements are true regarding this statistical programming language developed by Ihaka and Gentleman in the nineties?
As opposed to SAS and SPSS, R is completely open-source.
R is open-source, but it's hard to share your code with others since R uses a command-line interface.
It typically takes a long time for new and updated R packages to be released and made available to the public.
R is easy to use, but this comes at the cost of limited graphical abilities.
R works well with large data sets, if the code is properly written and the data fits into the working memory.
Ans: Statement 1 and 5.
As formerly discussed, a part of using programming language is to automate processes, to do so we need to create variables that store some value for reuse in later work, and of course adding reproducibility to our work.
In simplest sense, it allow us to store a value / object in R.
In R we use assignment operator <-
to assign a value to our variables.
Instructions:
Assign the value 42 to x
Print out the value of the variable x
In [6]:
########################################################
# Title: Variable Assignent #
# -----------------------------------------------------#
# About: Using variables using <- operator. #
# We can use = operator too! #
# -----------------------------------------------------#
# Useful Mnemonic, think of <- as an arrow! #
########################################################
# Assign the value 42 to x
x <- 42
# Print out the value of the varible x
print(x)
Pinch:
Notice that R did not print the result of 12 this time. When you use the assignment operator, R assumes that you don't want to see the result immediately, but rather that you intend to use the result for something else later on.
Suppose you have a fruit basket with five apples. As a data analyst in training, you want to store the number of apples in a variable with the name my_apples.
Instructions:
Using <-
, assign the value 5 to my_apples below the first comment.
Type my_apples
below the second comment. This will print out the value of my_apples
.
Click 'Submit Answer', and look at the console: you see that the number 5 is printed. So R now links the variable my_apples
to the value 5.
In [7]:
# Assign the value 5 to the variable called my_apples
my_apples <- 5
# Print out the value of the variable my_apples
my_apples
Every tasty fruit basket needs oranges, so you decide to add six oranges. As a data analyst, your reflex is to immediately create the variable my_oranges
and assign the value 6 to it.
Next, you want to calculate how many pieces of fruit you have in total. Since you have given meaningful names to these values, you can now code this in a clear way:
my_apples + my_oranges
Instructions:
Assign to my_oranges the value 6.
Add the variables my_apples and my_oranges and have R simply print the result.
Combine the variables my_apples and my_oranges into a new variable my_fruit, which is the total amount of fruits in your fruit basket.
In [8]:
# Assign a value to the variables my_apples and my_oranges
my_apples <- 5
my_oranges <- 6
# Add these two variables together and print the result
my_apples + my_oranges
# Create the variable my_fruit
my_fruit <- my_apples + my_oranges
If you assign a value to a variable, this variable is stored in the workspace. It's the place where all user-defined variables and objects live. The command ls() lists the contents of this workspace. rm(
a <- 1
b <- 2
ls()
rm(a)
ls()
The first two lines create the varibles a
and b
.
Calling ls()
shows all the objects currently in the workspace.
An object can be removed via rm("object name")
.
a
, ls() will show only one remaining object i.e. b
in the workspace.- Can remove both objects via `rm(a, b)`.
rm( list = ls())
.Instructions:
horses
, equal to 3.dogs
, which you set to 7.animals
, that is equal to the sum of horses
and dogs
.dogs
variable from the workspace.horses
and animals
remain.
In [9]:
# Clear the entire workspace
rm(list = ls())
# List the contents of your workspace
ls()
# Create the variable horses
horses <- 3
# Create the variable dogs
dogs <- 7
# Create the variable animals
animals <- horses + dogs
# Inspect the contents of the workspace again
ls()
# Remove dogs from the workspace
rm(dogs)
# Inspect the objects in your workspace once more
ls()
The notes are available as handouts. The content up ahead are just exercises and Lab content.
RQ1: Which R function should you use to get the variable type?
Ans: class()
RQ2: Which two of the following variables are logical values?
Ans:
TRUE
NA
RQ3: What is the purpose of the is.*() function?
Ans: It is used to see whether variables are of certain type.
RQ4: What is the purpose of the as.*() function?
Ans: To Transform the type of a variable to another type.
Objective:
Basic Data Types
5. Coercion for the sake of cleaning
Preface
Some of R's most basic types to get started are:
4.5
are called numerics.4L
are called integers. Integers are also numerics.TRUE
or FALSE
) are called logical
.+Text (or string) values are called characters
.
Note: how the quotation marks on the right indicate that "some text" is a character.
Instructions:
Change the value of the:
Note: that R is case sensitive!
In [ ]:
#####################################################
# Title: Basic Data Types #
# --------------------------------------------------#
# About: Practisng with data types in R #
#####################################################
# What is the answer to the universe?
my_numeric <- 42 #Interesting queue to SETI project!
# The quotation marks indicate that the variable is of type character
my_character <- "forty-two"
# Change the value of my_logical
my_logical <- FALSE
Preface:
Common knowledge tells you not to add apples and oranges. But hey, that is what you just did, no :-)?
The my_apples
and my_oranges
variables both contained a number in the previous exercise. The + operator works with numeric variables in R. If you really tried to add "apples"
and "oranges"
, and assigned a text value to the variable my_oranges
(see the editor), you would be trying to assign the addition of a numeric and a character variable to the variable my_fruit
. This is not possible.
Instructions:
In [1]:
# Assign a value to the variable called my_apples
my_apples <- 5
# Print out the value of my_apples
my_apples
# Assign a value to the variable my_oranges and print it out
my_oranges <- 6 #changed "six" to 6.
my_oranges
# New variable that contains the total amount of fruit
my_fruit <- my_apples + my_oranges
my_fruit
Preface:
Do you remember that when you added 5 + "six"
, you got an error due to a mismatch in data types? You can avoid such embarrassing situations by checking the data type of a variable beforehand. You can do this as follows:
class(some_variable_name)
In the workspace (you can inspect it by typing ls() in the console), some variables have already been defined. Which statement concerning these variables are correct?
Ans: a's class is numeric, b is a character, c is a logical.
Preface:
coercion to transform your data from one type to the other is possible. Next to the class()
function and the is.*()
functions, you can use the as.*()
functions to enforce data to change types. For example,
var <- "3"
var_num <- as.numeric(var)
converts the character string "3" in var to a numeric 3 and assigns it to var_num. Beware however, that it is not always possible to convert the types without information loss or errors:
as.integer("4.5")
as.numeric("three")
The first line will convert the character string "4.5" to the integer 4. The second one will convert the character string "three" to an NA.
Instructions:
var1
, a logical, to a character and assign it to the variable var1_char
.var1_char
actually is a character by using the is.character()
function on it.var2
, a numeric, to a logical and assign it to the variable var2_log
.var2_log
using class()
.var3
to a numeric and assign the result to var3_num
. Was it successful?
In [3]:
# Create variables var1, var2 and var3
var1 <- TRUE
var2 <- 0.3
var3 <- "i"
# Convert var1 to a character: var1_char
var1_char <- as.character(var1)
var1_char
# See whether var1_char is a character
is.character(var1_char)
# Convert var2 to a logical: var2_log
var2_log <- as.logical(var2)
var2_log
# Inspect the class of var2_log
class(var2_log)
# Coerce var3 to a numeric: var3_num
var3_num <- as.numeric(var3)
var3_num
Preface:
When might coercion come in handy?
When dealing with "messy" datasets.
Instructions:
as.numeric()
to convert the character age
; assign the result to a new variable age_clean
.as.logical()
, convert the numeric employed
and store the result to a new variable employed_clean
.as.numeric()
function, convert the respondent's salary
to a numeric; assign the resulting numeric to the variable salary_clean
.
In [ ]:
# Convert age to numeric: age_clean
age_clean <- as.numeric(age)
# Convert employed to logical: employed_clean
employed_clean <- as.logical(employed)
# Convert salary to numeric: salary_clean
salary_clean <- as.numeric(salary)